EPIDEMIOLOGY AND HEALTH DATA INSIGHTS

Keyword: Machine Learning

2 results found.

Review Article
Developing and Validating Predictive Models for Adverse Drug Reactions using Electronic Health Records (EHRs): A Narrative Review
Epidemiology and Health Data Insights, 1(6), 2025, ehdi019, https://doi.org/10.63946/ehdi/17369
ABSTRACT: Adverse drug reactions (ADRs) remain a major global challenge, contributing substantially to patient morbidity, mortality, and healthcare costs. Traditional pharmacovigilance approaches—spontaneous reporting and post-marketing surveillance—are hampered by underreporting, delays, and limited contextual data. The growing availability of electronic health records (EHRs), which capture longitudinal structured and unstructured patient information, presents an unprecedented opportunity to advance ADR prediction. This narrative review synthesizes recent progress in developing and validating predictive models that leverage EHRs, highlighting methodological approaches, challenges, and future directions. Predictive strategies range from traditional regression models to advanced machine learning and deep learning architectures, with multimodal frameworks increasingly integrating structured fields (demographics, labs, prescriptions) and unstructured clinical text through natural language processing. While ensemble and deep learning methods demonstrate superior performance, issues of data quality, missingness, bias, and interpretability persist. Robust validation frameworks—spanning internal cross-validation to multi-center external testing—are critical to ensure generalizability and clinical trustworthiness. Ethical considerations, including fairness, privacy, and transparency, remain central to safe deployment. Looking forward, promising avenues include federated learning across institutions, integration of multi-omics and pharmacogenomic data, explainable AI tailored for clinical use, and real-time monitoring through digital twin frameworks. These trajectories, combined with robust governance and clinician–data scientist collaboration, have the potential to transform ADR detection from a reactive process to proactive, personalized prevention. By synthesizing the existing evidence, this review provides insights into the development of more effective predictive models for ADRs and informs strategies for improving pharmacovigilance. This study will contribute to the ongoing efforts to leverage EHRs and predictive models for improving patient outcomes and reducing the burden of ADRs.
Methodological Paper
Methodological Note on Predicting One-Year Mortality for Chronic Diseases Using Administrative Data
Epidemiology and Health Data Insights, 1(4), 2025, ehdi015, https://doi.org/10.63946/ehdi/17159
ABSTRACT: Chronic diseases remain a leading cause of global mortality, underscoring the need for developing reliable models that predict mortality prediction to guide individualized treatments and optimize resource allocation. This methodological note presents a reproducible framework for predicting one-year mortality in chronic disease patients using large-scale administrative healthcare data. The approach employs retrospective cohort design, year-specific subcohorts, and a stratified 5-fold cross-validation using a broad range of machine learning models. Performance is assessed with multiple metrics, including AUC, sensitivity, specificity, and balanced accuracy, to account for class imbalance. Model interpretability is enhanced through SHapley Additive exPlanations (SHAP), enabling identification of key mortality predictors and their directional impact. The proposed framework is general and can be applied to different chronic diseases. It has already been successfully demonstrated in nationwide cohorts of patients with diabetes mellitus and chronic viral hepatitis in Kazakhstan, achieving AUC values of 0.74–0.80, comparable to international benchmarks despite relying on administrative data alone. The method is scalable and adaptable, allowing integration of laboratory and clinical data with feature selection to address high-dimensionality challenges. Its generalizability and clinical relevance, however, should be validated in practice using enriched datasets across additional chronic diseases and diverse populations.